PM2.5 infographic from New York City Community Air Survey Program
Source: https://a816-dohbesp.nyc.gov/IndicatorPublic/traffic/index.html
This week’s assignment will encompass the following concepts covered in Class 8 lecture & lab:
Class 8 Lecture and Assignment Materials:
Note: utilize the C8 backup data if for any reason the assignment steps for accessing data fail.
Class 8 Lab Materials:
This week’s readings will not be quizzed; there will be no quiz for week 8 assignment themes/readings. To follow, the FEMA reference #1 points to vulnerability mapping themes, indexes and ratings utilized within their hazards program; the #2 reading is a quick overview of ‘weighting’ indicators; and the #3 reading points to the CDC SVI index landing page - a great resource for understanding social vulnerability in the United States.
https://www1.nyc.gov/site/doh/data/data-publications/air-quality-nyc-community-air-survey.page
Directly linked to health outcomes, PM2.5 is particularly pernicious in high urban density geographies like New York City. For this assignment, a PM2.5 indicator will represent the Hazard, while the Vulnerable Population will be a subset of New York City’s total population: youth ages 0-17, a cohort that is particuarly susceptible to air pollution including PM2.5..
In this assignment the end goal is to ascertain census geographies that are at elevated risk for PM2.5 exposure.
Download and point a new QGIS project to the CDC SVI 2018 dataset for New York City census tracts.
Download the CDC SVI 2018 dataset HERE. Choose 2018 > New York > Census Tracts > Shapefile > Go.
https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
SVI 2018 for NYC Census Tracts
Metadata located HERE.
For the final map, the source of the data is stated as follows:
Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC Social Vulnerability Index 2018.
SVI2018_NEWYORK_tract.shp into the QGIS project, and save the project in a local directory of your making:SVI2018_NEWYORK_tract loaded to QGIS
SVI2018_US_county.shp in order to clean up the table and keep just those records necessary for the mapping. Here we will need all the basic data about the tracts, but don’t need all the SVI index themes. We will need total population and the youth 0-17 population. The youth population in the dataset is simply a replication of the US Census variable theme Persons aged 17 and younger estimate - Table B09001_001E:SVI 2018 Persons aged 17 and younger estimate - Table B09001_001E
Drop field(s) tool and Select All, then Toogle OFF the features that will be kept:Drop Fields
Drop Field Exceptions
Remaining fields:Temporary Layer
esri projection 102718 - nad 1983 stateplane new york long island fips 3104 feet
Remaining fields to this local area projection, and create temporary layer, then proceed to save as NYC_tracts.shp in the working directly for the assignment. Close the current project and open a new QGIS project and import the new NYC_tracts.shp. Note that the new project now has the local projection system attached to both project and layer.Reprojected
Reproject Features
pct_youth) with the following formula:"E_AGE17" / "E_TOTPOP" * 100
New Column - pct_youth
Statistics Tool - Main Menu
Note the Q1, Q3, Mean and Max values:
Statistics Tool Result for pct_youth
pct_youth variable. Overall this is a normal distribution but there are outliers to the right where some tracts approach upwards to 65%. If these tracts intercept hazard geographies - the location of high PM2.5 values - those will be the very high risk geographies in the final mapping.Histogram Result for pct_youth
With the vulnerability variable complete, the analysis area will be isolated to just those tracts within the five boroughs, not New York State at large.
Download and place the NYC borough shapefile in the working directory:
NYC Boroughs overlaid to NYC Census Tracts
Dissole nybb
analysis_geography.shp:Clipped Tool Parameters
analysis_geography.shp; note the NULL values for 41 features within the dataset. These geographies had a zero population to start; they are areas of the city that typically do not have populations - parks, waterways, ect. For our purposes, these tracts could be excluded and deemed ‘insufficient data’, or repositioned from NULL values to 0 - essentially no vulnerability as there are no vulnerable populations within those geographies. To follow, the second approach is detailed:NULL values in the dataset
NULL values selected in Map Canvas
Update Selected Existing Fields to value 0
analysis_geography, we will pivot to the hazard variable. To start, download the NYC Community Air Survey Data:NYC Community Air Survey Data
aa11_pm300m folder. Within the folder, we will import the raster w001001x.adf which holds the PM2.5 values for the most recent annual recording (2018-2019):Isolate aa11_pm300m folder
Raster w001001x.adf
Raster Theme and Unit of Measurement
w001001x.adf via the raster type, Data Source Manager:PM2.5 values in raster format
At this juncture, the values needed for the hazard - annual PM2.5 ug/m3 - is located in the raster cells, and we need those values to transpose to the census geographies. First note the unit of measurement:
µg/m3 is micrograms of gaseous pollutant per cubic meter of ambient air.
Next, note the EPA ‘acceptable level’ theshold is 12.0 μg/m3 detailed HERE.
If we view a Singleband pseudocolor mapping of the raster in conjunction with the raster histogram, we can see the bulk of the values fall within the EPA ‘acceptable level’, but indeed there is both a concentration of high values, and some of those values exceed the EPA level.
Raster Histogram
Concentration Patter
w001001x.adf overlaid to the analysis_geography, the values of the raster will be transposed to the vector features via Zonal Statistics. We will record the maximum statistic per census tract via the Zonal Statistics tool. In this way, as multiple raster cells may be contained within one census tract, only the raster cell with the maximum value relative to the others contained by a respective census tract will be recorded. The result will be saved as analysis_geography_pm25, and it will replace analysis_geography for calculations going forward.Access Zonal Statistics Tool
Create PM25 column using mean statistic
PM25max:Statistics Tool - Main Menu
Note the Q1, Q3, Mean and Max values:
Statistics Tool Result for PM25max
With the variable for vulnerability - pct_youth - and the variable for hazard - PM25max - calculated and located in the same feature (NYC Census Tracts - analysis_geography_pm25), the next step is to rescale the two variables to ranges that are comparable. The pct_youth value is a measurement of youth populations via percentage, while PM25max is a measurement of particular matter in the air on an annual basis (µg/m3 is micrograms of gaseous pollutant per cubic meter of ambient air). The resulting values are simply different types and ranges, and we want them to be consistent in order to develop a composite matrix.
To do this process, we could export the dataset to a software like R or Python, featuring statistics-intensive packages. Here we could run a typical normalization function and produce a new normalized variable for vulnerability and hazard. For our purposes in this assignment, we can simply use typical thematic classification through the natural breaks method across 3 classes per overlay theme; and then utilize a thematic mapping techique called bivariate choropleth approach.
Using the bivariate choropleth approach, two themes are overlaid to achieve a final scale matrix across two dimensions. This type of thematic approach works best with either 9 unique conditions or 16 unique conditions. In other words, the legend will have either 9 unique color conditions (3x3), or 16 conditions (4x4). More conditions can be utilized, but the final map runs the risk of too much visual complication resulting in poor map legibility.
In the following example, food insecurity and obesity each contain 3 ranks (High, Medium, Low), resulting 9 final conditions in the bivariate map:
Bivariate Map Example
analysis_geography_pm25 which will be copied in two times over in the Layers Panel. Save the project before proceeding.An effective bivariate map requires appropriate color overlay values, which some overlay combinations that are particularly effective. In the following two blog posts, various combinations and approaches are discussed in detail:
Recommended Color Schemes for Bivariate Mapping
For this tutorial guide, the second option has been picked utilizing the following Hex Code Values per Layer:
Vulnerability
#e8e8e8#ace4e4#5ac8c8Hazard
#e8e8e8#dfb0d6#be64acTo apply the hex codes per rank value, simply click on a rank color in Layers Panel. For instance, in the image capture below, categorical values of 1,2 and 3 exist for each theme; the 1 value is clicked and the - Rank 1 - #e8e8e8 Hex Code is applied.
Apply Hex Code in HTML notation dialog box
pct_youth and PM25max continuous values, the following breaks are achieved:Natural Breaks, 3 classes applied to each variable layer
Natural Breaks, 3 classes applied to each variable layer
As stated above, the hex codes can now be applied to each layer:
Vulnerability
#e8e8e8#ace4e4#5ac8c8Hazard
#e8e8e8#dfb0d6#be64acOnce applied, the final mapping should appear as follows with layers isolated:
Layer Breaks based on Natural Breaks
Note: while the natural breaks are shown in the Layers Panel, these are essentially ranks 1-3 for the final mapping.
pct_youth isolated
PM25max isolated
Note: changing the stroke color in each layer from black to a neutral gray will better accentuate the classified fill colors in each layer.
Multiply function is applied for Layer Rendering to only the top, first layer. The result of Multiply is seen in Step 11.Multiply rendering applied to first layer
Multiply function is enacted, resulting in the following map. Here the highest combination value (High Youth Populations + High PM25 values) is represented by deep blue-purple in one tract in Staten Island, one tract in East Williamsburg, and a few in the Bronx including Mott Haven/Port Morris. It also picks up Central Park which is somewhat misleading as the total population count is just a few individuals in this geography, a good portion being youth.Bivariate Mapping Result
One way to solve the Central Park problem would be select census tracts with very low total populations (for example, less than 20), export those to a new .shp and overlay this new .shp atop the two analysi_geography_pm25 layers, calling out its neutral symbolization as ‘insufficient data’. Consider this an extra step in the assignment if so desired. The downside of this approach is that it creates another layer of visual information that may degrade map legibility.
While the bivariate mapping itself is now complete (make sure to save the project), a special, custom legend has to be created. Luckily, their is a QGIS plugin for just this - the Bivariate Legend Plugin. Install the plugin and open the plugin icon from the Main Menu.
Imgur
pct_youth natural break value classification; and the second layer representing the PM25max natural break value classification. Set the Square Width to 48, choose Multiply as the legend method and click Generate Legend. In the example to follow, Reverse Colors is chosen for both layers, and the X/Y axis has been switched. This results in low values starting in the lower left of the legend:Imgur
legend.png:Export legend as legend.png
Proceed to creating a map layout, and place the map feature into the main map frame. To follow, the legend item will be placed into the map layout. Unlike the legend tool, we import the legend.png as an image file and design around that image file.
To start, use the Add Picture tool to add the element to the map layout, and then point its item properties to the legend.png in your working directory.
Add Picture Tool
Locating the legend.png within Item Properties
Text tool in map layout. In the image capture below, the orientation of text is being altered to vertical position:Text Orientation
Bivariate Map Legend examples / inspiration
The final map requires the following Items:
A map scale or orientation arrow is not necessary for this type of thematic design.
The following map example shows only the main map feature and bivariate legend completed. It is missing a title and data citation:
Draft of Map Design for Final Deliverable with main map feature + bivariate legend
For data citation, both the NY Community Survey and Census Data - the two mapped variables need to be recognized. You may want to state the unit of measurement, as well as the range of each variable. The trick is to provide the map user with just enough information to readily understand what each position in the bivariate map and legend represents without bogging down in unnecessary detail.
For the Title, you want to choose a positioning structure for the two variables in one declariative sentence. To follow are four suggestions for this sentence structure:
Mapping Social Vulnearbility Y to Exposure Hazard X
Concentrations of Social Vulnearbility Y relative to Exposure Hazard X
Location Vulnerability Y to Exposure Hazard X
Location At Risk: Mapping Vulnerability Y to Exposure Hazard X
Deliverable:
Final Deliverables to Canvas Class 8 Assignment.